Multimodal Speaker Recognition in a Conversation Scenario
نویسندگان
چکیده
As a step toward the design of a robot that can take part to a conversation we propose a robotic system that, taking advantage of multiple perceptual capabilities, actively follows a conversation among several human subjects. The essential idea of our proposal is that the robot system can dynamically change the focus of its attention according to visual or audio stimuli to track the actual speaker throughout the conversation and infer her identity.
منابع مشابه
Triggering Memories of Conversations using Multimodal Classifiers
Our personal conversation memory agent is a wearable ‘experience collection’ system, which unobtrusively records the wearer’s conversation, recognizes the face of the dialog partner and remembers his/her voice. When the system sees the same person’s face or hears the same voice it uses a summary of the last conversation with this person to remind the wearer. To correctly identify a person and h...
متن کاملMEMN: Multimodal Emotional Memory Network for Emotion Recognition in Dyadic Conversational Videos
Multimodal emotion recognition is a developing field of research which aims at detecting emotions in videos. For conversational videos, current methods mostly ignore the role of inter-speaker dependency relations while classifying emotions. In this paper, we address recognizing utterance-level emotions in dyadic conversations. We propose a deep neural framework, termed Multimodal Emotional Memo...
متن کاملAchieving Multimodal Cohesion during Intercultural Conversations
How do English as a lingua franca (ELF) speakers achieve multimodal cohesion on the basis of their specific interests and cultural backgrounds? From a dialogic and collaborative view of communication, this study focuses on how verbal and nonverbal modes cohere together during intercultural conversations. The data include approximately 160-minute transcribed video recordings of ELF interactions ...
متن کاملMeta-Classification of Multimedia Classifiers
Combining multiple classifiers is of particular interest in the multimedia systems, since there is usually data of very different types/modalities that should be mined or analyzed. Our wearable ‘experience collection’ system unobtrusively records the wearer’s conversation, recognizes the face of the dialog partner and remembers his/her voice. When the system sees the same person’s face or hears...
متن کاملBioSec Multimodal Biometric Database in Text-Dependent Speaker Recognition
In this paper we briefly describe the BioSec multimodal biometric database and analyze its use in automatic text-dependent speaker recognition research. The paper is structured into four parts: a short introduction to the problem of text-dependent speaker recognition; a brief review of other existing databases, including monomodal text-dependent speaker recognition databases and multimodal biom...
متن کامل